from __future__ import print_function, division
import thinkdsp
import thinkplot
import thinkstats2
import numpy as np
import scipy.fftpack
import warnings
warnings.filterwarnings('ignore')
import dct
%matplotlib inline
Exercise: In this chapter I claim that analyze1 takes time proportional
to $n^3$ and analyze2 takes time proportional to $n^2$. To
see if that's true, run them on a range of input sizes and time
them. In IPython, you can use the magic command %timeit.
If you plot run time versus input size on a log-log scale, you
should get a straight line with slope 3 for analyze1 and
slope 2 for analyze2. You also might want to test dct_iv
and scipy.fftpack.dct.
I'll start with a noise signal and an array of power-of-two sizes
signal = thinkdsp.UncorrelatedGaussianNoise()
noise = signal.make_wave(duration=1.0, framerate=16384)
noise.ys.shape
ns = 2 ** np.arange(6, 10)
ns
The following function takes an array of results from a timing experiment, plots the results, and fits a straight line.
def plot_bests(bests):
thinkplot.plot(ns, bests)
thinkplot.config(xscale='log', yscale='log', legend=False)
x = np.log(ns)
y = np.log(bests)
t = scipy.stats.linregress(x,y)
slope = t[0]
return slope
Here are the results for analyze1.
results = []
for N in ns:
print(N)
ts = (0.5 + np.arange(N)) / N
freqs = (0.5 + np.arange(N)) / 2
ys = noise.ys[:N]
result = %timeit -r1 -o dct.analyze1(ys, freqs, ts)
results.append(result)
bests = [result.best for result in results]
plot_bests(bests)
The estimated slope is close to 2, not 3, as expected. One possibility is that the performance of np.linalg.solve is nearly quadratic in this range of array sizes.
The line is curved, which suggests that we have not reached the array size where the runtime shows cubic growth. With larger array sizes, the estimated slope increases, so maybe it eventually converges on 3.
Here are the results for analyze2:
results = []
for N in ns:
ts = (0.5 + np.arange(N)) / N
freqs = (0.5 + np.arange(N)) / 2
ys = noise.ys[:N]
result = %timeit -r1 -o dct.analyze2(ys, freqs, ts)
results.append(result)
bests2 = [result.best for result in results]
plot_bests(bests2)
The results for analyze2 fall in a straight line with the estimated slope close to 2, as expected.
Here are the results for the scipy.fftpack.dct
results = []
for N in ns:
ys = noise.ys[:N]
result = %timeit -o scipy.fftpack.dct(ys, type=3)
results.append(result)
bests3 = [result.best for result in results]
plot_bests(bests3)
This implementation of dct is even faster. The line is curved, which means either we haven't seen the asymptotic behavior yet, or the asymptotic behavior is not a simple exponent of $n$. In fact, as we'll see soon, the run time is proportional to $n \log n$.
The following figure shows all three curves on the same axes.
thinkplot.preplot(3)
thinkplot.plot(ns, bests, label='analyze1',color='red')
thinkplot.plot(ns, bests2, label='analyze2',color='blue')
thinkplot.plot(ns, bests3, label='fftpack.dct',color='black')
thinkplot.config(xscale='log', yscale='log', legend=True, loc='upper left')
Exercise: One of the major applications of the DCT is compression for both sound and images. In its simplest form, DCT-based compression works like this:
Implement a version of this algorithm and apply it to a recording of music or speech. How many components can you eliminate before the difference is perceptible?
thinkdsp provides a class, Dct that is similar to a Spectrum, but which uses DCT instead of FFT.
As an example, I'll use a recording of a saxophone:
wave = thinkdsp.read_wave('music/193022__truthiswithin__tibetan-singing-bowl-struck.wav')
wave.make_audio()
Here's a short segment:
segment = wave.segment(start=1.0, duration=0.5)
segment.normalize()
segment.make_audio()
And here's the DCT of that segment:
seg_dct = segment.make_dct()
seg_dct.plot(high=8000)
thinkplot.config(xlabel='Frequency (Hz)', ylabel='DCT')
There are only a few harmonics with substantial amplitude, and many entries near zero.
The following function takes a DCT and sets elements below thresh to 0.
def compress(dct, thresh=1):
count = 0
for i, amp in enumerate(dct.amps):
if abs(amp) < thresh:
dct.hs[i] = 0
count += 1
n = len(dct.amps)
print(count, n, 100 * count / n, sep='\t')
If we apply it to the segment, we can eliminate more than 90% of the elements:
seg_dct = segment.make_dct()
compress(seg_dct, thresh=10)
seg_dct.plot(high=8000)
And the result sounds the same (at least to me):
seg2 = seg_dct.make_wave()
seg2.make_audio()
To compress a longer segment, we can make a DCT spectrogram. The following function is similar to wave.make_spectrogram except that it uses the DCT.
def make_dct_spectrogram(wave, seg_length):
"""Computes the DCT spectrogram of the wave.
seg_length: number of samples in each segment
returns: Spectrogram
"""
window = np.hamming(seg_length)
i, j = 0, seg_length
step = seg_length / 2
# map from time to Spectrum
spec_map = {}
while j < len(wave.ys):
segment = wave.slice(i, j)
segment.window(window)
# the nominal time for this segment is the midpoint
t = (segment.start + segment.end) / 2
spec_map[t] = segment.make_dct()
i += step
j += step
return thinkdsp.Spectrogram(spec_map, seg_length)
Now we can make a DCT spectrogram and apply compress to each segment:
spectro = make_dct_spectrogram(wave,seg_length=1024)
for t, dct in sorted(spectro.spec_map.items()):
compress(dct, thresh=0.2)
In most segments, the compression is 75-80%.
To hear what it sounds like, we can convert the spectrogram back to a wave and play it.
wave2 = spectro.make_wave()
wave2.make_audio()
And here's the original again for comparison.
wave.make_audio()
As an experiment, you might try increasing thresh to see when the effect of compression becomes audible (to you).
Also, you might try compressing a signal with some noisy elements, like cymbals.